World WordNet Database Structure: An Efficient Schema for Storing Information of WordNets of the World
نویسندگان
چکیده
WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was developed at Princeton University. Over a period of time, many language WordNets were developed by various organizations all over the world. It has always been a challenge to store the WordNet data. Some WordNets are stored using file system and some WordNets are stored using different database models. In this paper, we present the World WordNet Database Structure which can be used to efficiently store the WordNet information of all languages of the World. This design can be adapted by most language WordNets to store information such as synset data, semantic and lexical relations, ontology details, language specific features, linguistic information, etc. An attempt is made to develop Application Programming Interfaces to manipulate the data from these databases. This database structure can help in various Natural Language Processing applications like Multilingual Information Retrieval, Word Sense Disambiguation, Machine Translation, etc.
منابع مشابه
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number given to each concept. WordNet is designed to capture the vocabulary...
متن کاملA General Overview
BalkaNet is an EC funded project (IST-2000-29388) that started in September 2001 and will end in August 2004. It aims at developing [109] aligned wordnets for the following Balkan languages: Bulgarian, Greek, Romanian, Serbian, Turkish and to extend the Czech wordnet previously developed in the EuroWordNet project. BalkaNet project has insofar delivered many useful results in the fields of both...
متن کاملMapping and Structural Analysis of Multi-lingual Wordnets
In this paper, we present observations on structural properties of wordnets of three languages: English, Hindi, and Marathi. Hindi and Marathi, spoken widely in India, rank 5th and 14th respectively in the world in terms of the number of people speaking these languages. The observations suggest the existence of the ‘small world’ property in wordnets and also lend credence to the belief that the...
متن کاملLinking Korean Words with an Ontology
This paper describes our ongoing work on linking Korean word senses with the concepts of an ontology. We have few Korean wordnets which are linked to upper-level ontologies, although the need for such wordnets/ontologies has increased not only in the academic world but also in the industry. We present a method for linking Korean senses with the concepts of SmartSUMO, which uses various language...
متن کاملThe Automatic Mapping of Princeton WordNet Lexical-Conceptual Relations onto the Brazilian Portuguese WordNet Database
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980 ́s. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important...
متن کامل